Group-specific cellular metabolism in Medulloblastoma

Background Cancer metabolism influences multiple aspects of tumorigenesis and causes diversity across malignancies. Although comprehensive research has extended our knowledge of molecular subgroups in medulloblastoma (MB), discrete analysis of metabolic heterogeneity is currently lacking. This study seeks to improve our understanding of metabolic phenotypes in MB and their impact on patients’ outcomes. Methods Data from four independent MB cohorts encompassing 1,288 patients were analysed. We explored metabolic characteristics of 902 patients (ICGC and MAGIC cohorts) on bulk RNA level. Moreover, data from 491 patients (ICGC cohort) were searched for DNA alterations in genes regulating cell metabolism. To determine the role of intratumoral metabolic differences, we examined single-cell RNA-sequencing (scRNA-seq) data from 34 additional patients. Findings on metabolic heterogeneity were correlated to clinical data. Results Established MB groups exhibit substantial differences in metabolic gene expression. By employing unsupervised analyses, we identified three clusters of group 3 and 4 samples with distinct metabolic features in ICGC and MAGIC cohorts. Analysis of scRNA-seq data confirmed our results of intertumoral heterogeneity underlying the according differences in metabolic gene expression. On DNA level, we discovered clear associations between altered regulatory genes involved in MB development and lipid metabolism. Additionally, we determined the prognostic value of metabolic gene expression in MB and showed that expression of genes involved in metabolism of inositol phosphates and nucleotides correlates with patient survival. Conclusion Our research underlines the biological and clinical relevance of metabolic alterations in MB. Thus, distinct metabolic signatures presented here might be the first step towards future metabolism-targeted therapeutic options. Graphical Abstract Supplementary Information The online version contains supplementary material available at 10.1186/s12967-023-04211-6.


Bulk RNA-seq analysis of ICGC and MAGIC cohorts
and the R package pheatmap (2). The input dataset was limited to a set of 2,071 metabolism-related genes (3), and the clustering method was set to "complete". Silhouette method was conducted using R package cluster (4) to determine an optimal number of clusters. In addition to the silhouette method, we considered the previous classification of samples, visual inspection of the dendrogram splitting and corresponding heatmap patterns. Despite an optimal number of three clusters, seven clusters were created in total to be able to focus on differences between established MB groups. Further data pre-processing and annotation were conducted with basic R functions. For each cluster, differentially expressed genes (DEGs) were calculated with DESeq2(5), and Kaplan-Meier curves were plotted for all samples with available survival data (n=87) with the R package survminer (6) and default parameters.
One cluster was excluded after DEG analysis. In the initial quality control described above, it was part of a group of borderline samples (Additional file 1: Fig. S1A), which were not excluded to avoid a further reduction of cohort size. As no matching cluster was found in the larger MAGIC validation cohort, this cluster was excluded from further 3 analysis. Due to the unavailability of clinical data for most samples of this cluster, the according cluster had not been included in the statistical analysis.

MAGIC
Pre-processed normalised log2 gene expression values for 763 medulloblastoma RNA microarray samples were downloaded from the MAGIC cohort dataset GSE85217 and restricted to the 2,071 chosen metabolism-related genes that were used for the preparation of the ICGC cohort. A heatmap was created with pheatmap using the ward.D clustering function. The definition of clusters was implemented considering the results of the silhouette method, gene expression patterns, and the heatmap's dendrogram splitting in line with the ICGC cohort, resulting in six clusters in total. DEGs between clusters were calculated with the R/Bioconductor package limma (7). As with the ICGC cohort, Kaplan-Meier curves were plotted in R for samples with fully available survival data (n=612) using survminer and default parameters.
For both ICGC and MAGIC cohorts, bulk RNA heatmaps were also constructed using a second set of 1,771 genes compiled from metabolic pathway signatures from Rosario et al. (8) to validate our findings. Genes encoded by the mitochondrial DNA were excluded from the validation of bulk RNA and nDNA analyses but are still part of the OXPHOS gene signature used in maxstat analysis described below. Intersect function in R was employed to calculate overlapping and unique elements of both gene lists.

Statistical Analysis
ICGC MB metadata was extracted for the parameters age (continuous and paediatric vs adult), progression-free (PFS) and overall survival (OS), tumour dissemination stage (M-stage), gender, recurrence and death cases. R was used to conduct Shapiro 4 tests for normality per cluster and parameter. Kruskal-Wallis and pairwise Wilcoxon rank-sum tests were applied to test for significant differences in age (continuous), PFS and OS, and M-stage. A Chi-square test was used to test variables gender, age (paediatric vs adult), recurrence and death cases for independence, while a pairwise Fisher test was conducted to compare the observed proportions of two clusters.
Additionally, for age (paediatric vs adult), recurrence and death cases, observed proportions of cases were compared to the average proportion of the dataset with exact binomial test (for n<30) and I-sample proportions test with continuity correction (for n≥30). All statistical tests were calculated with basic R functions (significance level 0.05, adjustment method Benjamini-Hochberg).

Functional analysis
For all clusters (bulk RNA and single-cell RNA), IPA canonical pathway analysis of DEGs was performed. Due to the uneven distribution of high and low expressed DEGs among clusters (Additional file 1: Fig. S3B), analysis was limited to upregulated DEGs (logFC>0). Resulting pathways were sorted by p-value and filtered for relevant pathways based on IPA classification (focus on metabolism) and existing literature.
In line with the previous approach, two sets of DEG lists were explored for both cohorts.
Highly expressed DEGs (logFC≥1) comparing all clusters were examined for ICGC and MAGIC cohorts and comparing only G3/G4 clusters for the MAGIC cohort (Additional file 1: Fig. S4). The restriction to DEGs with a logFC≥1 was done due to a limitation to the number of genes Metascape can analyse simultaneously. When only ICGC G3/G4 clusters are compared, only a few DEGs meet this criterium for I_G3/4.3.
Computational deconvolution of bulk RNA samples from both cohorts was conducted with R v4.1.3(10) and MCP-counter version 1.2.0 (11). For both cohorts, the abundance of cells from the tumour microenvironment (TME) was compared across all clusters and only G3/G4 clusters. Kruskal-Wallis and Wilcoxon rank-sum tests were used to assess the significance of discovered differences. P-values were adjusted for multiple comparisons using Bonferroni correction.
Sankey plots were constructed for DEG comparison utilising the R-package networkD3 (12). DEGs were sorted by logFC (high to low). For

Single-cell RNA-seq analysis
Within the scope of the single-cell RNA-seq (scRNA-seq) analyses, data from two cohorts of MB patients were examined. First, we explored data from six human MB samples treated in Münster, referred to as MSMB. Second, we validated our findings using scRNA-seq data of 28 paediatric MB patients (abbreviated as RMB) published by Riemondy et al. (13).
After converting the raw 10X data to fastq format with CellRanger's mkfastq, the data 6 were aligned against the human reference transcriptome GRCh38 with the CellRanger count function using default values and the filtered feature matrices were tested for basic quality statistics with CellRanger's web summary function. Samples with a sequencing saturation of <50% after the first sequencing run were re-sequenced when possible.
Seurat objects were created per sample with a minimum of three cells and a minimum feature number of 200; furthermore, only cells with less than 25% of mitochondrial genes were considered further. Additionally, outlier cells with a high nCount_RNA value were considered to be doublets and removed accordingly; the threshold for removal varied per sample (7,000 to 80,000, mean=26833.33).
All filtered samples were then processed with Seurat's SCTransform function and default parameters. Dimensionality reduction was conducted per PCA, integration anchors were identified with Seurat, and the dataset was integrated using the previously calculated anchorset to remove batch effects.
Based on this integration, a clustering was created with Seurat's FindCluster function using a resolution parameter of 0.5 and default parameters otherwise. UMAPs were used to visualise cluster and sample distributions. Differential genes were calculated with Seurat's FindMarkers function for each cluster and analysed using QIAGEN IPA.
For Additional file 1: Fig. S10D, genes of significant canonical metabolic pathways were grouped depending on metabolic categories stated in IPA. Genes belonging to multiple categories were counted once per category. Categories containing more than 20% of all metabolic genes classified for the according cluster are presented in Additional file 1: Fig. S10E-G.  (Fig. 4B).

RMB
Functional analysis of mutated genes was performed using the ToppGene Suite.
In accordance with our bulk RNA analysis, we validated our findings using the second set of metabolic genes introduced above(8). Classification of single nucleotide variants was retrieved from Northcott et al. (27) for both sets of metabolic genes. The R package maxstat (parameters: smethod = "LogRank", pmethod = "condMC", minprop = 0.2) was then used to perform a maximally selected rank statistics analysis on the survival data and oncoscore in order to split the chosen RNA cohort into a "high" and "low" scoring group according to maxstat's optimised threshold value (28,29).

Maxstat analysis and survival curves
Survival curves were computed using survfit function (30) and then plotted with the R package survminer(6) and default parameters. Metabolic signatures, oncoscores, cutoff values and statistical data are provided in Additional file 9: Table S8.
Multivariate cox regression analysis was performed utilising the R packages survival(30) and survminer (6). The regression models included 87 samples of ICGC and 612 samples of the MAGIC cohort, respectively, and evaluated the effect of the variables IP and pyrimidine metabolism as well as the MB groups. In MAGIC cohort, 13 samples were automatically excluded throughout the process due to missing data.
For both metabolic pathways grouping by maxstat as "low gene expression" was used as the reference variable, and for the MB groups, WNT was considered the reference.
The proportional hazard assumption was tested for a regression model including both metabolic pathways and using cox.zph function (survival package). Both cohorts were stratified for MB groups. Survminer's ggcoxdiagnostics function was employed to visualise dfbeta values in order to test for influential observations. Observations exceeding a cutoff of 2/√n were identified as potentially influential. Two samples causing influential observations in ICGC cohort were not excluded in order to avoid further reduction of cohort size and number of events. There were no influential observations in MAGIC cohort. Forest plots were constructed with the R package forestplot (31).
Repetition of survival analysis using the median oncoscore as cut-off value for the division of cohorts was conducted as a control. Samples with exact median oncoscore have consistently been assigned to the "high gene expression" group.  Kaplan-Meier curves show the overall survival of patients from MAGIC and ICGC cohorts. Patients have been divided into two groups depending on RNA expression levels of genes involved in the metabolism of A) purines, B) oxidative phosphorylation, and C) pyruvate metabolism. Log-rank test was used to calculate p-values, and p<0.05 was considered significant. Grouping patients was performed by maximally ranked statistics but was not significant for the pathways shown here. Overall survival of patients from MAGIC and ICGC cohorts similar to Kaplan-Meier curves shown before ( Fig. 5; Additional file 1: Fig. S14) for A) pyrimidine, B) IP, C) purine metabolism, D) oxidative phosphorylation, and E) pyruvate metabolism. Division of patients into two groups depending on RNA expression levels of genes involved in previously analysed pathways has been done using the median gene expression level as cut-off value. Significance was calculated as mentioned above.